[spark] Support nested fields in SparkFilterConverter by Mesut-Doner · Pull Request #8399 · apache/paimon

Mesut-Doner · 2026-06-30T15:37:46Z

Purpose

Currently, SparkFilterConverter throws an UnsupportedOperationException when it encounters dot-separated nested field paths (e.g. a.b.c). This limits predicate pushdown capabilities in Spark when queries filter on nested Struct types.

This PR implements nested field support in SparkFilterConverter so that Spark V1 Filter objects on nested fields are correctly converted into Paimon Predicate structures:

Nested Schema Resolution: Added getNestedFieldType(...) and resolveField(...) to recursively walk the RowType schema along dot-separated path components to find the correct nested field's DataType.
Transform-based Predicate Conversion: Refactored the converter branches (EqualTo, In, IsNull, IsNotNull, GreaterThan, etc.) to use FieldTransform(FieldRef) and call the corresponding PredicateBuilder methods that accept Transform instead of index-based builders.
Literal Conversion: Updated convertLiteral(...) and convertString(...) to correctly resolve nested field paths and convert literals to their matching leaf data type.

Tests

Added a new test case testNestedField() in SparkFilterConverterTest to verify that nested struct field predicates are successfully converted to Paimon predicates.

JingsongLi · 2026-07-01T03:05:17Z

+            }
+        }
+
+        Transform transform = new FieldTransform(new FieldRef(topLevelIndex, field, fieldType));


This does not actually evaluate the nested field. FieldTransform only reads InternalRowUtils.get(row, fieldRef.index(), fieldRef.type()), so for a filter like a.b = 1 this FieldRef still reads top-level column a (index 0) but with b type. That can make predicate.test(row) and statistics pruning read the struct column as an int/string instead of traversing into b. The new test only checks toString(), so it misses this runtime behavior. Please add a real nested-field transform/path traversal, or keep these filters unsupported until evaluation and stats pruning can handle nested paths correctly.

…stats pruning bypass

JingsongLi

I found a few correctness/compatibility issues in the nested Spark filter support.

JingsongLi · 2026-07-01T09:52:37Z

        this.index = index;
        this.name = name;
        this.type = type;
+        this.nestedIndexes = nestedIndexes;


Because FieldRef now carries nestedIndexes / nestedArities, every place that remaps a FieldRef has to preserve them. Existing remappers such as PredicateProjectionConverter, PartitionValuePredicateVisitor, and TableQueryAuthResult still rebuild refs with the 3-arg constructor, so a pushed predicate like a.b = 1 loses its nested path after projection/auth/partition remapping. The resulting FieldTransform then reads top-level a as the leaf type instead of traversing to b, which can produce wrong filtering or runtime failures. Please add a copy/remap helper that preserves the nested metadata and use it for all FieldRef rewrites before enabling nested predicates.

JingsongLi · 2026-07-01T09:52:37Z

            In in = (In) filter;
-            int index = fieldIndex(in.attribute());
+            FieldInfo fieldInfo = resolveField(in.attribute());
            return builder.in(


This changes empty IN behavior even for top-level columns. PredicateBuilder.in(int, ...) explicitly handles an empty literal list by creating an In leaf predicate, but PredicateBuilder.in(Transform, ...) falls through to or(empty) and throws. As a result, an empty Spark In filter is no longer converted consistently. Please make the transform overload handle empty lists the same way as the index overload.

JingsongLi · 2026-07-01T09:52:37Z

+    }
+
+    private FieldInfo resolveField(String field) {
+        String[] parts = field.split("\\.");


This unconditionally treats any dot as a nested path. Paimon schema validation does not forbid top-level column names containing ., and the previous converter resolved those fields with rowType.getFieldIndex(field). With a top-level column named a.b, this now looks for top-level a and either fails or resolves a different field. Please try an exact top-level field lookup first, and only split as a nested path when no exact field exists.

[spark] Support nested fields in SparkFilterConverter

85017e3

Mesut-Doner force-pushed the spark_nested_fields branch from 990ec18 to 85017e3 Compare June 30, 2026 15:39

Mesut-Doner changed the title ~~Spark nested fields~~ [spark] Support nested fields in SparkFilterConverter Jun 30, 2026

JingsongLi reviewed Jul 1, 2026

View reviewed changes

[spark] Fix SparkFilterConverter nested field runtime evaluation and …

86388e5

…stats pruning bypass

JingsongLi reviewed Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[spark] Support nested fields in SparkFilterConverter#8399

[spark] Support nested fields in SparkFilterConverter#8399
Mesut-Doner wants to merge 2 commits into
apache:masterfrom
Mesut-Doner:spark_nested_fields

Mesut-Doner commented Jun 30, 2026 •

edited

Loading

Uh oh!

JingsongLi Jul 1, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

JingsongLi Jul 1, 2026

Uh oh!

JingsongLi Jul 1, 2026

Uh oh!

JingsongLi Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Mesut-Doner commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

Uh oh!

JingsongLi Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mesut-Doner commented Jun 30, 2026 •

edited

Loading